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ABSTRACT 

Originally released in 2005, BacMap is an electronic, 
interactive atlas of fully sequenced bacterial 
genomes. It contains fully labeled, zoomable and 
searchable chromosome maps for essentially all 
sequenced prokaryotic (archaebacterial and eubac- 
terial) species. Each map can be zoomed to the level 
of individual genes and each gene is hyperlinked to a 
richly annotated gene card. The latest release of 
BacMap (http://bacmap.wishartlab.com/) now 
contains data for more than 1700 bacterial species 
(-^lOx more than the 2005 release), corresponding to 
more than 2800 chromosome and plasmid maps. All 
bacterial genome maps are now supplemented with 
separate prophage genome maps as well as 
separate tRNA and rRNA maps. Each bacterial 
chromosome entry in BacMap also contains graphs 
and tables on a variety of gene and protein statistics. 
Likewise, every bacterial species entry contains a 
bacterial 'biography' card, with taxonomic details, 
phenotypic details, textual descriptions and images 
(when available). Improved data browsing and 
searching tools have also been added to allow 
more facile filtering, sorting and display of the 
chromosome maps and their contents. 

INTRODUCTION 

When the first bacterial genome was completed in 1995 it 
took more than a year of sequencing effort and cost nearly 
S2 million (1,2). Today it is possible to sequence, assemble 
and even annotate an entire bacterial genome in less than 
a day, at a cost of just a few hundred dollars (3). The ease 
with which bacterial genomes can be sequenced has led to 
an explosion of microbial sequences being assembled and 



deposited into various databases. Currently, GenBank (4) 
hsts more than 7000 prokaryotic genomes with 1790 (as of 
27 October 201 1) fully completed bacterial and archaebac- 
terial genomes and 5230 genomes marked as 'in progress' 
(with ~l/3 of these having draft sequences available). 
Never before has so much genome-scale information 
been available about so many different bacterial species. 
A growing challenge, therefore, is to find ways to better 
manage, display and compare this mountain of sequence 
data. 

Over the past decade a number of excellent visualization 
tools have been developed for these purposes, such as 
CGView (5), BaSys (6) and DNAPlotter (7). These 
programs can create colorful, annotated, interactive 
circular genome maps that are ideal for bacterial 
genome maps. In addition, tools such as Circos (8) and 
Bluejay (9) have been developed to allow users to create 
colorful comparative genome maps. At the same time that 
these visualization tools were being developed, several 
superb whole-genome resources emerged that nicely 
integrated gene, genome, phenotypic and taxonomic infor- 
mation together. Some of these databases include the 
GenBank Genome Database (4), KEGG Genomes (10), 
PEDANT (11), IntegrS (12), Ensembl Genomes (13), 
TIGR's CMR (14), BioCyc (15), GOLD or the 
Genomes Online Database (16), PATRIC (17) and our 
own BacMap (18). 

Originally released in 2005, BacMap was quite unique 
compared to most whole-genome databases as it was 
designed to serve more as an electronic atlas rather than 
a pure genome database. As an atlas, BacMap's primary 
role was to provide tools and resources to enable users to 
interactively select, display and manipulate bacterial 
genome maps. BacMap proved to be quite popular with 
many researchers in the microbiology community as it 
allowed facile, platform-independent viewing of both the 
structure and genomic content of many popular microbial 
genomes. Indeed, a number of the visualization tools 
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Table 1. Comparison of bacterial and archebacterial genome resources 



Database name 


BacMap" 


NCBI genome*" 


KEGG genomes'' 


IntegrS'' 


JCVI CMR'' 


PEDANT' 


No. of Bacterial genomes (as of 27 


1671 


1671 


1370 


2592 (incl. 


672 


811 


October 2011) 








draft seqs) 






No. of Archaea genomes (as of 27 


119 


119 


116 


106 


48 


57 


October 2011) 














Includes taxonomy 


Yes 


Yes 


Yes 


Yes 


Yes 


Yes (via NCBI) 


Sequencing center/source 


Yes 


Yes 


No 


No 


Yes 


No 


Includes references 


Yes 


Yes 


Yes 


Yes 


Yes 


Yes (via NCBI) 


Genome statistics 


Yes 


Yes 


Yes 


Yes 


Yes 


No 


Statistical charts 


Yes 


No 


No 


Yes 


No 


No 


Bacterial descriptions 


Yes 


Some 


No 


Some 


No 


Some (via NCBI) 


Genome map 


Yes 


No 


No 


No 


Yes 


No 


tRNA/rRNA map 


Yes 


No 


No 


No 


No 


No 


Prophage map 


Yes 


No 


No 


No 


No 


No 


Zoomable maps 


Yes 


No 


No 


No 


No 


No 


Sortable views 


Yes 


Yes 


No 


No 


Yes 


No 


Phenotype filter 


Yes 


No 


No 


No 


No 


No 


Data fields per Gene/Prot 


63 


7 


10 


10 


16 


16 


BLAST query 


Yes 


Yes 


No 


Yes 


Yes 


Yes 


Text search 


Yes 


No 


Partial 


Yes 


Yes 


Yes 


Precomputed alignments 


No 


Yes 


No 


Yes 


Yes 


No 


Analytical tools 


No 


Yes 


No 


No 


Yes 


No 


Pathway information 


Yes 


No 


Yes 


No 


Yes 


No 



"httpi/Zbacmap.wishartlab.com. 
''http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi. 
''http://www.genome.jp/kegg/catalog/org_hst.html. 
''http://www.ebi.ac.uk/integr8. 

''http://cmr.jcvi.org/tigr-scripts/CMR/CmrHomePage.cgi. 
'^http://pedant.gsf de/. 

developed for BacMap have become widely used in the 
microbial genomics community (5,6). However, as the 
number of sequenced genomes grew and as our access to 
computer resources waned, it became increasingly difficult 
to keep all components of BacMap current. Fortunately, 
additional computer resources have recently become avail- 
able and this has allowed us to substantially update and 
upgrade BacMap over the past year. 

Here, we describe the major improvements and changes 
made to BacMap, including the expansion of the database 
(by 10 X over the 2005 release), the addition of new 
genome visualization tools (for displaying prophage and 
tRNA/rRNA genes), the construction of thousands of 
new bacterial 'biography' pages and the redesign of the 
website to improve the ability of users to query, sort or 
select genes, genomes, pathway, taxonomic and/or pheno- 
typic information from the database. With these new 
enhancements along with our improved ability to 
semi-automatically maintain and update this resource, 
we believe BacMap has now become one of the most 
complete, current and comprehensive bacterial genome re- 
sources available (see Table 1 for a detailed comparison 
between BacMap with other commonly used microbial 
genome resources). BacMap is available at http:// 
bacmap . wishartlab .com . 

WHAT'S NEW IN BACMAP? 

Details relating to BacMap's overall architecture, layout, 
general querying capabilities, and annotation protocols 
have been described previously (18) and will not be 
reviewed here. Instead, we shall focus primarily on 



describing the changes and enhancements made to 
BacMap since the last release. More specifically, we will 
describe: (i) the growth and enhancements made to 
BacMap's existing content; (ii) changes to the BacMap 
interface and layout; and (iii) improvements to 
BacMap's data querying and filtering capabilities. 

CONTENT GROWTH AND ENHANCEMENT 

The first release of BacMap contained fully annotated 
gene/protein inaps from just 177 bacterial species (18). 
The latest release of the BacMap database (as of 27 
October 2011) contains pre-calculated genome from 1790 
completed eubacterial and archaebacterial species or 
strains, consisting of more than 2880 chromosomes and 
plasmids (or replicons). Overall, this represents a ~ 10-fold 
increase in the number of bacterial species in the database. 
In the previous version of BacMap, only one type of 
genome map (a gene/protein map) was available for 
each species, which translated to about 300 different 
chromosome or plasmid maps. Now each bacterial 
species in BacMap is associated with three different 
kinds of genome maps: (i) a gene/protein map; (ii) a 
prophage map and (iii) a tRNA/rRNA map. 
Consequently BacMap now contains more than 5300 
pre-calculated bacterial chromosome (>400kb) maps 
and more than 3300 pre-calculated bacterial plasmid 
(<400kb) maps. The decision to include both prophage 
and tRNA/rRNA maps in the latest release of BacMap 
was motivated by a number of user requests. It was also 
based on several emerging trends in microbial genomics 
where information about prophage 'species' and 16S 
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rRNA sequences is being routinely used to help under- 
stand bacterial evolution, phylogeny and gene transfer. 
Certainly, the identification and mapping of prophage se- 
quences (which can occupy up to 20% of some bacterial 
genomes) is an oft-ignored component to many bacterial 
annotation efforts. 

As with previous versions of BacMap the gene/protein 
maps generated via CGView (5) and annotated using 
BASys (6). The new tRNA/rRNA maps were generated 
using existing genome annotations and supplemented 
with information from tRNAscan (19) while BacMap's 
prophage maps were generated using PHAST (20). Both 
the tRNA/rRNA maps and prophage maps are displayed 
using a different and somewhat more sophisticated Google 
Map style of graphics. Both of these new BacMap display 
tools, which require Adobe Flash, support both circular 
and Unear genomic views as well as interactive browsing 
and dynamic image labeling. Additional details about the 
display capabilities, methodology and the accuracy of 
PHAST's prophage predictions are available in the 
PHAST manuscript or on the PHAST website (20). 

In addition to the significant growth in the number of 
genomes (lOx) and map types (3x), there has also been 
significant growth in the number (from 80 to 1790), depth 
(5 data fields to 38 data fields) and proportion (from 50% 
to 100%) of bacterial species with bacterial 'biographies'. 
These biography cards contain information on the bacter- 
ium's name(s), accession numbers, taxonomy, subspecies/ 
strain, date of genome release, sequencing center, com- 
pleteness, sequencing center, sequencing quality, 
sequencing depth, sequencing method, isolation site/ 
country, number of replicons, chromosome shape, 
plasmid shape, gram stain, shape, motility, flagellar 
presence, number of membranes, oxygen requirements, 
optimal temperature, temperature range, habitat, biotic re- 
lationship, host name, cell arrangement, sporulation 
properties, metaboUsm, energy source, associated diseases 
(if any) and pathogenicity. A brief textual description of the 
organism covering its physiology, general characteristics, 
ecological niche, source, relevance to human or animal 
disease and related references is also given. Additionally 
an image of the organism (if available) is provided. This 
information was mined from Integr8 (12), the NCBl 
BioProjects (4), HAMAP (21), Wikipedia, Micro bewiki, 
Karyn's Genomes, various bacterial genome home pages, 
Google Images as well as other sources and manually 
edited. Each BacMap biography page or 'BioCard' has 
two other tabs that also contains a list of metabolic 
pathways that occur or are though to occur in that 
organism and a list of references or database hyperlinks. 

Each gene in BacMap is Hnked to a gene card that 
provides detailed information about that gene/protein. 
The original release of BacMap provided just 11 data 
fields for each gene or protein. The latest version now 
has and average of 63 data fields, covering a wide range 
of information on gene features, protein features, protein 
functions, subcellular locations and other relevant data. 
The rich annotation for the completed genomes in 
BacMap was primarily derived or calculated from 
BASys (6). In addition to these BASys annotations, 
COG and PEDANT functional classifications (where 



available) have been extracted from their respective 
onUne databases (4,11). Overall, the amount or 'depth' 
of gene/protein specific data in BacMap has grown by 
more than a factor of 5. Given that there are '^5 miUion 
genes in the new release of BacMap (compared to approxi- 
mately 500000 genes in the original release), this repre- 
sents a nearly 50 x increase in the quantity of sequence 
annotation data. 



INTERFACE CHANGES 

The growth in BacMap's content and size has necessitated 
a number of changes in its interface. The database is still 
easily browseable, but to increase the number of entries 
per page a more compact, tabular display has been 
adopted. As seen in Figure 1, each row in the BacMap 
genome table has nine columns covering: (i) the organism 
name/species; (ii) replicon type (chromosome/plasmid); 
(iii) release date; (iv) number of replicons; (v) GenBank 
identifier; (vi) replicon length; (vii) GC content; (viii) 
Maps; and (ix) Tools. Under the Maps column users 
may click on either the Gene/Protein map (a circle icon), 
the tRNA/rRNA map (a tRNA icon) or the Prophage 
map (a T4 phage icon) to generate the desired interactive 
genome map. Under the Tools column users may select 
the genome statistics link (a bar-graph icon), the 
genome-specific BLAST search (a magnifier icon with a 
chromosome), the genome-specific text search (a magnifier 
icon with a T) or the download link (a green arrow). 
Clicking on the organism name wiU launch BacMap's 
'bacterial biography' card or BioCard, yielding detailed 
taxonomic and phenotypic information about the 
organism of interest. BacMap's genome table may be 
navigated by clicking on page numbers or Previous/Next 
arrows marked above the table. Additionally, by clicking 
on specific columns, users may be able to sort the display 
according to organism name (alphabetically — which is the 
default display), replicon type, date of release, number of 
replicons, identifier number, replicon/genome length and 
GC content. 

Clicking on the map links, stats Unks, BLAST and text 
searches will open up new windows which will produce 
genome images or fiUable text boxes that are very 
similar to those seen in the original version of BacMap. 
The BLAST and text search tools associated with each 
chromosome or plasmid are specific to that plasmid or 
that chromosome. In other words, the searches are 
limited to the data in that replicon's BacMap genome 
cards. The same zooming and map navigation tools are 
still used for the gene/protein maps while the tRNA/ 
rRNA and prophage maps are navigated the same way 
as those described for PHAST. The graphs and charts 
formats for the BacMap stats Hnks are essentially un- 
changed from the previous version of BacMap. 

SELECTION AND QUERYING ENHANCEMENTS 

The large number and diversity of bacterial genomes in 
BacMap has also necessitated other changes to the inter- 
face and to the querying tools. BacMap now has 
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Figure 1. A screenshot montage of the BacMap database showing the different display, browsing and filtering tools. 
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a sophisticated data filtering system located on the left side 
of its genome table (Figure 1). Using this tool, users may 
filter or select genomes based on their taxonomy and/or 
phenotype. BacMap's taxonomy and phenotype filters 
may be toggled to be displayed or hidden by clicking on 
their respective headings (the default is to display all filter 
options). To enable the selection of different taxonomic 
groupings, users may select any combination of 78 yes/no/ 
NA check boxes under 2 different kingdoms (Bacteria or 
Archaea) or 24 different phyla/classes. Choosing one (say 
Archaea) will reformat the standard BacMap browsing 
table and display only the known archaebacteria in the 
table. This table may then be sorted, as described earlier, 
by clicking on the appropriate column headings. To 
enable the selection of different phenotypic groupings in 
BacMap, users may select any combination of yes/no/NA 
check boxes under 16 different phenotypic headings 
including: (i) Flagella; (ii) Human pathogen; (iii) 
Motility; (iv) Number of membranes; (v) Number of 
chromosomes; (vi) Chromosome shape; (vii) Plasmid 
shape; (viii) Cell shape; (ix) Cell arrangement; (x) Gram 
stain; (xi) Temperature range; (xii) Oxygen requirements; 
(xiii) Biotic relationship; (xiv) Habitat; (xv) Energy source; 
and (xvi) Sporulation. The default is to leave all check 
boxes cleared, which allows the full BacMap genome 
table to be viewed. If one or more check boxes is 
selected, the BacMap genome table will be reformatted 
to display only those organisms with the selected pheno- 
type. Once a phenotype has been selected, the BacMap 
browsing table will be reformatted and will display only 
those organisms with the selected phenotype. As before, 
the resulting table may be sorted by clicking the appropri- 
ate column headings. Using these sorting and filtering 
tools, it is now relatively easy for a user to perform a 
query such as: 'Find all Archaebacteria that are 
hyperthermophiles with >55% GC content' or 'Find all 
Proteobacteria that are gram positive and that have 
genome sizes greater than 6 megabases'. 

BacMap now supports both genome-specific BLAST 
(22) queries, filtered database BLAST queries and whole 
database BLAST queries. The genome-specific BLAST 
queries are accessed by clicking on the BLAST hyperlink 
for a specific organism or genome in BacMap's genome 
table (the last column). The whole BacMap database or 
filtered database BLAST queries can be accessed by 
clicking on the 'BacMap BLAST' hnk, located at the 
top right of the BacMap genome table. This will 
produce a fiUable text box for standard protein or gene 
sequence queries. If the database has not been filtered (via 
taxonomy or phenotype) prior to clicking the BacMap 
BLAST link, then the entire database will be searched. 
If the database has been filtered, then the BLAST search 
will be limited to only those genomes displayed in 
the BacMap genome table. The same model also works 
for BacMap's text search tool (located adjacent to the 
BacMap BLAST hnk), where users may perform 
genome-specific text queries, filtered database text 
queries or whole database text queries. 

To help with the speed and specificity of the text 
searches, there is now an added filter in the newest 
release of BacMap. In particular, users can now choose 



to search selected components of the database covering 
only the: (i) Genus/Species/Strain names; (ii) Gene/ 
Protein Names; (hi) text in the bacterial biography cards 
('Biocards'); or (iv) text in the metabolic pathway descrip- 
tors ('Metabolic Pathway'). This selection is done by using 
the pull-down menu located in the text search box. Using 
these text searching tools, users may first filter the 
database to select 'aU Archaea that are hyperthermophiles' 
and then from this subset, search for the term 'methano- 
gen' from the biography card. The result would be a hst of 
sequenced Archaea that are hypertherniophilic methano- 
gens. This flexibility in searching and filtering should make 
BacMap particular useful for a wide range of microbiolo- 
gists and metagenomics speciahsts. 

CONCLUSION 

To summarize, BacMap is a richly annotated, easily 
queried and highly interactive electronic atlas containing 
data from more than 1700 fully sequenced bacterial 
genomes. The latest release of BacMap builds on earlier 
strengths but also adds tremendously to the number of 
annotated genomes, the number and quality of visual 
displays, the amount of phenotypic or organism-specific 
data and the ability for users to display, sort, search and 
filter the data. As shown in Table 1, these changes and 
enhancements have made the latest release of BacMap 
quite comparable to a number of more widely known or 
widely used bacterial genome resources. Furthermore, 
with its unique focus on using easily understood, rapidly 
accessible, interactive visual displays to transmit 
genome-scale information, we anticipate that BacMap 
may have particularly broad appeal among students, edu- 
cators and scientists. With a growing interest in bacterial 
genomics and the growing ease with which bacterial 
genomes can be sequenced, we believe that an up-to-date 
resource such as BacMap is a timely addition and a useful 
contribution to the field. 
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